Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model
نویسندگان
چکیده
In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degree-corrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing effect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degree-corrected random-walk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.
منابع مشابه
Finding Planted Partitions in Nearly Linear Time using Arrested Spectral Clustering
We describe an algorithm for clustering using a similarity graph. The algorithm (a) runs in O(n log n+m logn) time on graphs with n vertices and m edges, and (b) with high probability, finds all “large enough” clusters in a random graph generated according to the planted partition model. We provide lower bounds that imply that our “large enough” constraint cannot be improved much, even using a ...
متن کاملConsistency of Spectral Hypergraph Partitioning under Planted Partition Model
Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences. A number of algorithms exist in the literature that extend standard approaches for graph partitioning to the case of hypergraphs. However, theoretical aspects of such methods have seldom received attention in the literature as compared to the extensive studies on the guarantees of graph p...
متن کاملSpectra of Random Graphs with Planted Partitions
Spectral methods for clustering are now standard, and there are many toy examples in which they can be seen to yield more sensible solutions than classical schemes like vanilla k-means. A more rigorous analysis of these methods has proved elusive, however, and has so far consisted mostly of probabilistic analyses for random inputs with planted clusterings. Such an analysis, typically calls for ...
متن کاملClustering from General Pairwise Observations with Applications to Time-varying Graphs
We present a general framework for graph clustering and bi-clustering where we are given a general observation (called a label) between each pair of nodes. This framework allows a rich encoding of various types of pairwise interactions between nodes. We propose a new tractable and robust approach to this problem based on convex optimization and maximum likelihood estimators. We analyze our algo...
متن کاملRegularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012